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ABSTRACT 

We investigate the potential of exploiting Lya Emitters (LAEs) to constrain the volume- 
weighted mean neutral hydrogen fraction of the IGM, Shi, at high redshifts (specifically 
z ~ 9). We use "semi-numerical" simulations to efficiently generate density, velocity, and 
halo fields at z = 9 in a 250 Mpc box, resolving halos with masses M > 2.2 x 10 8 M Q . We 
construct ionization fields corresponding to various values of Shi- With these, we generate 
LAE luminosity functions and "counts-in-cell" statistics. As in previous studies, we find that 
LAEs begin to disappear rapidly when Shi > 0.5. Constraining 5im(z = 9) with luminos- 
ity functions is difficult due to the many uncertainties inherent in the host halo mass «-> Lya 
luminosity mapping. However, using a very conservative mapping, we show that the number 
densities derived using the six z ~ 9 LAEs recently discovered by Stark et al. (2007) im- 
ply Shi < 0.7. On a more fundamental level, these LAE number densities, if genuine, require 
substantial star formation in halos with M < 10 9 M Q , making them unique among the current 
sample of observed high-z objects. Furthermore, reionization increases the apparent cluster- 
ing of the observed LAEs. We show that a "counts-in-cell" statistic is a powerful probe of 
this effect, especially in the early stages of reionization. Specifically, we show that a field of 
view (typical of upcoming IR instruments) containing LAEs has >10% higher probability of 
containing more than one LAE in a Shi > 0.5 universe than a Shi ~ universe with the same 
overall number density. With this statistic, an ionized universe can be robustly distinguished 
from one with Shi ^ 0.5 using a survey containing only ~ 20-100 galaxies. 
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1 INTRODUCTION 

The reionization of hydrogen in the intergalactic medium (IGM) 
is a landmark event in the early history of structure formation, be- 
cause it defines the moment at which galaxies (and black holes) 
affected every baryon in the Universe. As such, it has received a 
great deal of attention - both observationally and theoretically - in 
the past several years. U nfortunately, the existing observational ev- 
idence is enigmatic (see lFan et alj|2003 for a recent review). Elec- 
tron scattering of cosmic microwave background photons implies 
that re ionization occur red at z ~ 10, albeit with a large uncer- 
tainty JPage et alliOQrj) . On the other hand, quasars at z ~ 6 show 
some evidence for a ra pid transition in the globally-averaged neu- 
tral fr action, aim (e.g jFan et al ] |2006l : lMesinger & Haimar]|2004 
2007). However the Lya absorption is so saturated in the Gunn- 
Peterso n (GP) trough that constraints deriv ed from that spectral 
region ( Fan et aDl2006 ; Maselli et al.ll2007h are difficult to inter- 
pret (e.g, iLidz et alJkOQrj : iBecker et afll2007l : iBolton & Haehneltl 
l2007h . 

Of particular recent interest have been efforts to constrain 



reionization (and star formation at high redshifts) through searches 
for distant galaxies. Currently, the most efficient way to find 
distant galaxies is by searching for Lya emission lines (which 
result f rom the re-processing of ionizing photons inside the 
galaxy; Partridge & Peebles] 1 19671) : such surveys now routinely 

reach z > 6, where constraints on reionization become interest- 

ir 



> 

ing (e.g., iHu et a l. 2002; Kodai ra et al 



Santos etal. 2004; Stairway et al. 



20031: iRh oads et alj 200- 
2004 iTaniguchi et al.l I2OO. 



redshifts i Willis & Courbin|2005 


;llve et alj2006l:ICubv et al. 2007; 


IStark et alj|2007l: bta et alJko07 


). They offer a number of ad van- 



tages over more traditional techniques. First, narrowband searches 
reduce the sky background, especially if placed between the 
bright sky lines t hat (nearly) blanket the near-infrared sky (e.g., 
lBartonetal]|2004h . Second, they efficiently select galaxies at a 
known redshift (albeit with some contamination by lower-redshift 
interlopers). Third, they increase the signal-to-noise by focusing on 
an emission line. Of course, the disadvantage is that extraordinar- 
ily deep spectroscopic follow-up is required to study the detailed 
properties of the sources (for an illustration of the difficulties, see 
IStark et al 1 20071) . 
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However, these properties are relatively unimportant for 
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studying the ionization state of the IGM, which can be measured 
from the Lya line photons themselves. In particular, these are ab- 
sorbed if they pass through neutral gas near the galaxy. This is a 
consequence of the enormous Lya opt ical depth of a neutral IG M: 
r G p ~ 6.5 x 10 5 Shi [(1 + z)/10] 3/2 dGunn & Petersonl 1965b . so 
even those photons passing through the damping w ing of the Lya 
resonance will be absorbed (Miralda-Escud e 1998). 

Thus, as the IGM becomes more neutral, the Lya selec- 
tion technique will detect fewer and fewer objects (even after 
accounting for cosmological evolution in their intrinsic abun- 
dance); the number of such galaxies therefore measures xm 
faaiman&Sp^nlll999l) . The optical depth encountered by a 
galaxy's Lya photons depends primarily on the extent of the HII 
region that surrounds it: the photons redshift as they stream through 
the ionized gas (suffering little absorption), so they are somewhere 
in the wings of the line by the time they encounter the neutral gas. 
Thus, the amount of absorption depends sensitively on the size dis- 
tribution of ionized bubbles during r eionization. Early work treated 
each galaxy or quasar in isolation l Madau & Reesll200ol : iHaimanl 
12002: Santos 20ol: lHaiman & Cenll2005l) 7 so the HII regions were 
rather small even late in reionization. Including cl ustering dramat- 
ically increases the sizes of the ionized bubbles dFurlanetto et al.l 
2004), whic h allows Lya galaxies to be visible farther back in 
reionization dFurlanetto et alj2004l2006l : lMcOuinn et al.ll2007l) . 

The current obser vational picture is ambiguous. 
Malhotra & Rhoadsl |2004) compared luminosity functions of 
Lyman a-emitters (LAEs) at z — 5.7 and z — 6.5 (bracketing the 
time at which quasar spectra indicate that reionization ends) and 
found no evolution in the number density over this range, requiring 
a substantial ionized fractio n at z ~ 6.5 (see alsolHaiman & Cenl 
20051: iFurlanetto et all 2006: Mal hotra & Rhoads!l2006l) . However. 



Kashik awa et ai] d2006l) found a decline in the number density of 



bright LAEs over the same r edshift range (with a larger sample of 
objects), and lOta et al.l d2007h sugg e st a fa rther decline to z ~ 7. 
On the other hand, iDawson et all d2007h argued that the LAE 
luminosity function remains nearly const ant from z = 3-7 using a 
compilation of other surveys. Moreover, IStark et ail (|2007) found 
a surprisingly high abundance of faint emitters at z ~ 9. These 
objects are of particular interest to us, because they present the 
best possibility of being affected by reionization. 

The interpretation of these results is complicated by the un- 
known internal properties of the sources: in particular, how do their 
Lya line luminosities depend on the underlying halo mass, and 
what other factors affect the mapping? A number of recent stud- 
ies have developed models for the high-redshift LAE population 



Dijkstra et al. 2006; iFernandez & Komatsu 
~ 20071) . ~ 



dLe Delliou et al.l2006l: 

120071: IStark et alj 120071 ; iKobavashi et alj 120071) . but the relevant 
parameters (such as the star formation efficiency and time-scale 
and the initial mass function) are still mostly unco nstrained, es- 
pecially if they evolve. Indeed, Dijk stra et al.1 d2006t) have argued 
that the declining num ber density of bright LAEs observed by 
iKashikawa et alj a2006h can b e simply a result of the ev olving mass 
function. On the other hand, IKashikawa et alj yP06) argued that 
the LAE luminosity function evolved more than that of Lyman- 
break selected galaxies over this range, so that the decline could no t 
be attributed t o the galaxies themse lves (see also lOta et alj|2007h . 
Most recently, IDawson et alj d2007D argued that the LAEs evolve 
less rapidly than other galaxy samples. In this paper, we will present 
new simulations of the luminosity function evolution throughout 
reionization, focusing on the high-redshift "frontier" at z = 9, and 
try to place some model-independent constraints on the sources and 
the IGM. 



However, because of the difficulties involved, it is advanta- 
geous to consider other signatures of reionization. In particular, 
the clustering of LAE s can be a powerful probe o f reionization 
dFurlanetto etai]|2004 120061 ; iMcOuinn et al]|2007l) . This is be- 
cause their visibility is modulated by the pattern of ionized bubbles: 
large bubbles (which surround overdensities with many galaxies) 
have a small damping-wing optical depth, so that most sources will 
be visible, while small bubbles (where galaxies are rare to begin 
with) will appear to be entirely empty. Thus, during reionization 
observable LAEs should appear more clustered than the underly- 
ing population, with the boost decreasing as the ionized bubbles 
grow (which happens relatively quickly). Clustering of the overall 
galaxy population should evolve much more slowly than number 
densities, so a rapid change in the clustering is a good indicator of 
reionization ( McOu inn et ai]|2007l) . 

Existing work has examined clustering primarily in the con- 
text of the power spectrum (or correlation fu nction) of the galaxies 
dFurlan etto et al. 2006; Mc Ouinn et alj2007l ; see also the appendix 



McOuinn et al 1 120071 f or a brief discu s sion o n void and peak 



statistics). Most recently, McO uinn ct all d2007t) suggest that the 
lack of clustering evolution in the z = 5.7 and z = 6.6 LAEs rules 
out a substantially neutral universe at z = 6.6. However, the bubble 
modulation is actually highly non-gaussian. It is therefore useful 
to consider other statistics, especially early in reionization (when 
sources are so rare that measuring the scale-dependent power spec- 
trum robustly will be extremely difficult). Here we take a first step 
in this direction by considering a "counts-in-cells" measurement of 
the clustering. 

This paper is organized as follows. In fj2] we briefly outline 
the main points of our "semi-numerical" simulation, originally pre- 
sented in Mesinger & Furlanetto (2007a). In Sj3]we present the Lya 
optical depth distributions of galaxies in our simulations, as func- 
tions of halo mass and khi. In £|4]we show the resulting LAE lumi- 
nosity functions, comparing with the lStark et al.1 d2007h sample in 
£]4. II In ij5] we present statistics using "counts in-cell". Finally in 
Sj6]we summarize our key findings and offer some conclusions. 

Unless stated otherwise, we quote all quantities in comov- 
ing units. We adopt the background cosmological parameters (£7a, 
SIm, n, cr 8 , H ) = (0.76, 0.24, 0.0407, 1, 0.76, 72 km s" 1 
Mp c" 1 ), consistent with the three-year results of the WMAP satel- 
lite dSpergel et alj2006l) . 



2 SEMI-NUMERICAL SIMULATIONS 

We use an excursion-set approach combined with first-order La- 
grangian perturbation theory to efficiently generate density, veloc- 
ity, halo, and ionizatio n fields at z = 9. This "semi-nu merical" sim- 
ulation is presented in Mesinger & Furlanetto! d2007al) . to which we 
refer the reader fo r details. A simi l ar halo -finding scheme has also 
been presented bv lBond&Mversldl996l) and a similar scheme to 
generate ionization fields has been presented by |Zahnetai1d2007h . 

Our simulation box is 250 Mpc on a side, with the final den- 
sity, velocity and ionization fields having grid cell sizes of 0.5 Mpc. 
Halos with a total mass M > 2.2 x 10 s M are filtered out of 
the linear density field using excursion-set theory, with mass scales 
spaced as AA'I/M — 1.2. Note that we are able to resolve halos 
with masses less than a factor of two from the cooling mass likely to 
be pertinent mid-reionization (or more precisely, during the redshift 
interval 6 < z < 10), corresponding to gas with a temperature of 
T ~ 10 4 K (e.g.lEfstathioull992l ; lThoul & Weinberg! 199o1 : lGnedinl 
l200d : IShapiro et alJI 19940 . Halo locations are then adjusted using 
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first-order Lagrangian perturbation theory. The resulting halo field 
matches both the mass function and statistical clustering pro perties 
of halos in N-body simulations I IMesinger & Furlanettdl2007al) . 

In constructing the ionization field, the IGM is modeled as a 
two-phase medium, comprised of fully ionized and fully neutral re- 
gions (this is a fairly accurate assumption at high-redshifts preced- 
ing the end of reionization, unless the X-ray background is rather 
strong). Using the same halo field at z — 9, we generate ionization 
fields corresponding to different values of xm by varying a single 
effic iency parameter^] C, again using the excursion- set approach 
(c.f. lMesinger & Furlanettol2007al : lFurlanetto et alj2004l) . 

This semi-numeric approach is thus ideally suited to the LAE 
problem, because we are able to "resolve" relatively small halos 
and simultaneously sample a large, representative volume of ion- 
ized bubbles. Note that our "simulations" do not make any predic- 
tions (and only weak assumptions) about the Lya luminosities of 
these sources; we will discuss the mapping from halo mass (the 
fundamental quantity for our simulations) to observable proper- 
ties below. This mapping must also be prescribed in state-of-the-art 
cosmological simulations, which cannot self-consistently include 
hydrodynamics (and hence star formation) while also subtending 
a rep resentative volume during reionization (c.f., iMcOuinn et al.l 
120071) . 



3 DAMPING WING OPTICAL DEPTH DISTRIBUTIONS 

To study the effects of reionization, we first need to track the ab- 
sorption of line photons from neutral gas in the IGM. We divide the 
absorption into two parts: the resonant and damping wing compo- 
nents. This is convenient because they correspond to two spatially 
distinct sets of absorbers. Resonant absorption occurs whenever a 
photon that begins blueward of line center redshifts into resonance 
(either inside the HII region surrounding the source or in the neu- 
tral gas outside). Because the line-center optical depth is so large, 
this component can lead to nearly complete absorption - but only 
for photons on the blue side of the line (e.g.. lSantosll2004l) . We do 
not model this component in detail in this work, assuming that a 
constant fraction of the Lya line gets resonantly absorbed. 

On the other hand, photons that begin redward of line center 
only redshift farther away. It is therefore only the damping wings of 
the line that affect them, and the amount of absorption, exp[-ro], 
where td is the damping-wing optical depth, will depend sensi- 
tively on the size of the host HII region but be insensitive to the pre- 
cise xm inside the ionized region. It is this component that evolves 
most rapidly through reionization. FigureQ]shows the visible halos 
at z = 9, with M exp[-r D ] > 1.67 x 10 10 M Q , and xm ~ 0, 



1 This differs from the method recently used by IMcOuinn et al 1 J2007I) . 
who used a suite of N-body simulations with radiative transfer. They also 
argued that a faster but effective method was to generate ionization fields 
at several different redshifts (using a single radiative transfer simulation) 
but apply them to a halo field at a single redshift. They thus assumed that 
the ionization topology is only a weak function of redshift IMcOu inn et alj 
120071) . The speed of our approach, which does not require a radiative 
transfer algorithm, allows us to generate ionization fields at a single red- 
shift self-consistently, using the same halo field, merely by adjusting the 
source efficiencies. However, we confirm that the ionization maps are very 
nearly redshift-independent for most purposes (including those studied by 
McOuinn et al. 2003). The exceptions to this are the rare events occurring 
in < 10 — 3 of the typical fields of view discussed in Jj5] 



0.26, 0.51, 0.77, (left to right); the obscuration from damping wing 
absorption is obvious. 

We compute the total line center Lya optical depth along a 
randomly chosen line-of-sight (LOS) centered on a halo location at 
z s = 9.0. We do this by summing the damping wing optical depth, 
td, contribution from each neutral hydrogen patch (extending from 
Zbegi n to z en d) encountered along the LOS, using the approxima- 
tion dMiralda-Escu de 1998): 



TD 



6.43 x 10 

1 H - ^begin 
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m e cH(z B ) 

1 + Zend 



(1) 



j / 1 + Zbcgin \ _ j / 1 + Zend \ 



where nj/(z s ) is the mean hydrogen number density of the IGM at 
redshift z s , and 
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We use eq. |QJ to calculate the optical depth for each neutral hy- 
drogen patch, summing the contributions of patches along the LOS 
for 200 Mpc0 wrapping around the simulation box if needed. We 
construct distributions of td for each halo mass scale and ioniza- 
tion topology (i.e. Shi). We make sure to process LOSs from every 
halo of a particular mass scale, cycling through the halo list until 
each mass scale undergoes a minimum of 3 x 10 4 such Monte- 
Carlo realizations. We also include the component of the source 
halo's peculiar velocity along the LOS, v, in our estimates of td 
by substituting z s — > z B + v /c.0 

As delineated above, we only compute td at a single observed 
wavelength, A bs = 1215.67(1 + z s + v/c) A, instead of over the 
entire wavelength extent of the Lya emission line. To zeroth or- 
der, the damping wing optical depth varies only weakly across the 
typical scale of the Lya emission line, although its shape does pro- 
vide useful infor mation about the IGM properties ; we refer the in- 
tereste d reader to iMe singer & F urlanettol d2007bl) : IMcOuinn et al.l 
j2007l) . We also emphasize that, in our two-phase IGM approxi- 
mation, we do not model the resonance contribution to the total 
optical depth. However, any mass-independent attenuation of the 
Lya line can be swept into the assumed halo mass <-» Lya lu- 
minosity mapping (c.f. $4A\ . Most studies (including our own in 
Sj4]l assume such a mass-independent resonant attenuation, justi- 
fying it with the narrow halo m ass range (~ 1 dex ) probed by 
present instruments/surveys (e.g. iDiikstra et alj|200"6l ; IStark et all 
l2007l ; lMcOuinn et al.ll200% . We defer a detailed study of this as- 
sumption to a future work. 

In Figure [2] we show some of the optical depth distribu- 
tions generated by the above procedure, for xm{z = 9) = 

2 This number was chosen experimentally in order to ensure convergence 
of the td distributions at the mass scales and neutral fractions studied in 
this work. 

3 Note that for simplicity we do not include the peculiar velocity of the 
neutral IGM patches, which could be correlated on large scales with the 
peculiar velocities of the sources, thus diminishing the impact of velocities 
on td ■ However, note from Fig. [2] that peculiar velocities do not play a 
major role in the optical depth distributions except when td 2> 1 and 
Sjji ~ 1 (where the absorption is so strong that its precise value does not 
matter for our purposes). The treatment of velocities is uncertain in any 
case because we ignore the possibility of galactic winds, which can move 
the Lya lines redward and decrease the absorption I Santos 20041) . 
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Figure 1. Maps of visible halos at z = 9, assuming M min = 1.67 X 10 10 M Q , and x m RJ 0, 0.26, 0.51, 0.77, (left to right). All slices are 250 Mpc on a 
side and 20 Mpc deep (corresponding to a narrow band filter with R = A/AA ~ 125). The 1500 3 halo field is smoothed onto a 200 3 grid here for viewing 
purposes. 




Figure 2. Damping wing optical depth distributions at xm(z = 9) = 0.93, 0.72, 0.51, 0.34 (left to right). Curves correspond to LOSs originating from 
halos with masses M = 2.6 X 10 11 , 1.2 X 10 11 , 5.4 X 10 10 , 2.5 X 10 10 , 1.1 X 10 10 , 5.1 X 10 9 , and 2.3 X 10 9 Mq (left to right). Solid curves include 
the peculiar velocity offset of the host halo; dotted curves do not. 



0.93,0.72,0.51,0.34 (left to right). The panels show r D dp(> 
m , M , xm) / Otd , i.e. the probability per Iiitd that a LOS orig- 
inating from a halo of mass M embedded in an IGM with a neu- 
tral fraction of xm has an optical depth of td- Curves correspond 
to LOSs originating from halos with masses M = 2.6 x 10 11 , 

1.2 x 10 n , 5.4 x 10 10 , 2.5 x 10 10 , 1.1 x 10 10 , 5.1 x 10 9 , and 

2.3 x 10 9 Mq (left to right). Solid curves include the peculiar ve- 
locity offset of the host halo; dotted curves do not. 

In general, the optical depth distributions become narrower 
and their mean values decrease as the halo mass increases. This is 
to be expected because the halo bias is a function of mass, with the 
more massive halos more likely to sit in larger overdense regions - 
which in turn contain more i onizing sources and larger HII bubbles 
than l ess massive halos (e.g. lFurlanetto et al . 2006; Mc Ouinn et al.l 
|2007|) . Furthermore, since the overlap of several HII bubbles makes 
a smaller fractional change to the size of a large bubble hosting 
the most massive halos, the distributions of td are narrower for 
massive halos. Both of these effects diminish as reionization pro- 
gresses, and the td distributions start merging together. 

It is also evident from Fig. [2] that galaxy peculiar veloci- 
ties broaden the optical depth distributions, shifting their means to 
smaller values. This effect is the strongest for the smallest mass ha- 
los, because they are typically surrounded by the smallest HII bub- 
bles (whose edges are at the smallest velocity offsets from the Lya 
line center). The effects of source halo velocities on td also dimin- 
ishes with decreasing xm\ for xm < 0.7 peculiar velocities play 
a negligible role in determining td for halos with M > 10 9 Mq. 
More importantly, Fig.|2]shows that peculiar velocities have virtu- 
ally no impact on the optical depth distributions in the pertinent, 
td ~ 1 regime. Hence, they should not have a noticeable effect on 
any of our conclusions. 



Figure [3] quantifies the evolution of the optical depth distribu- 
tions with xm - The distributions were generated using source halos 
at a fixed mass scale of M = 5.4 x 10 10 ' Mq. Curves correspond 
to xm = 0.26, 0.34, 0.42, 0.51, 0.61, 0.72, 0.83, 0.88, 0.93, from 
left to right. The curves illustrate that the more complex topolo- 
gies and large scatter among disparate LOSs at lower values of 
xm significantly broaden the distributions of td, in addition to 
decreasing their means. This is not because the bubble size dis- 
tribution widens (in fact, it narrows as reionization progresses; 
iFurlanetto et al.l2006h but because some LOSs pass almost entirely 
through neighboring ionized bubbles, while others pass through 
long skewers of neutral gas. A similar effect has been shown to 
hav e important cons equences for the interpretation of quasar spec- 
tra dLidz et al.ll2007l) . 



3.1 Comparison to an Analytic Model 



Furlanetto et al. pres ented similar opt i cal dep th distributions 

calculated via the analytic IFurlanetto et al.l d2004h model for the 
ionized bubbles during reionization. Our "semi-numeric" simula- 
tion is also based on this model, but it includes additional effects 
such as the complicated, non-spherical geometry of the HII regions. 
It is therefore illuminating to compare our distributions with those 
of the p urely analytic model. F or the latter, we follow the calcu- 
lation of Furlan etto et al.l ([2004) except that for simplicity we as- 
sume that all ionizing sources sit in the center of their bubble. 
The damping wing optical depth distributions then follow from 
arguments similar to the "extended Press-Schechter" formali sm 
jPress & Schechterlll974l ; lBond et alJll99ll ; lLacev & Colefl 993). 

Figuref4]shows our results for halos with M = 5.4 x 1O 1O M0 
and a variety of neutral fractions. The solid curves are again taken 
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Figure 3. Damping wing optical depth distributions generated using LOSs 
originating from halos with masses M = 5.4 X 1O 1O M0. Curves correspond 
to x m = 0.26, 0.34, 0.42, 0.51, 0.61, 0.72, 0.83, 0.88, 0.93, left to right. 
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Figure 4. Damping wing optical depth distributions for halos with masses 
M = 5.4 X 1O 1O M . The solid curves are computed from our simulations, 
and the dotted curves are the corresponding best-fit lognormal distributions. 
The dashed curves show the predictions of the analytic model. Within each 
set, the curves correspond to x m = 0.26, 0.34, 0.51, 0.72, 0.88, from left 
to right. 

from the simulations. The dashed curves show the predictions of 
the analytic model. There are two key differences with the sim- 
ulated results. First, the analytic model predicts that the distribu- 
tion peaks at larger td than th e simulation. This is not a surprise: 
iMesinger & Furlanettol d2007ah showed that, when defined in this 
way, the simulations have larger ionized bubbles early in reioniza- 
tion. This is because of neighboring bubbles that slightly overlap; 
the spherical geometry required by the analytic model does not ef- 
fectively capture such events. This can reduce the apparent optical 
depth by a factor of ~ 3 early on, so it is not a small effect. 

Second, the analytic model also under-predicts the width of 



the distribution, particularly on the small-To tail at the end of reion- 
ization. This is significantly larger than the differences in the dis- 
tributions of bubble sizes. Instead, it likely results from the an- 
alytic model's assumption of a uniformly ionized medium out- 
side of the source's host bubble: as described above, many lines 
of sight will pass through nearby, fully-ionized bubbles, allow- 
ing for the existence of many (nearly) clear lines of sight (see 
IMesinger & Furlanettoll2007bl : lMcOuinn et alj|2007l) . 

Because the analytic model does not provide a good fit, we 
searched for a better representation of the distributions. We find that 
our results are nearly always well-fit by log-normal distributions, 



dp(> t d ) 



1 







td T D y/2na 2 D 



exp 



Qhtd - ^dY 



2ol 



(2) 



where fiD and od are parameters determined by fits to the sim- 
ulated distributions. Some example fits are shown by the dotted 
curves in Figure |4j in general, the lognormal distribution overes- 
timates the strength of the tail at large td and underestimates its 
strength at small td, with the skewness increasing for smaller ha- 
los and larger neutral fractions. The following simple bilinear fits 
for the parameters are accurate to < 10% over the entire mass range 
of our simulations and from Shi = 0.26-0.93: 

(id = -3.37 + log Mm (-0.115 - 0.587i H i) + 5.30x m (3) 
u d = 1.68 + log Mio (-0.155 - 0.265x H i) - 1.08xhi, (4) 

where M = Mio x lO lo A/0. Note, however, that this fit only 
applies to halos at z = 9; the dependence on mass will no doubt 
change with redshift. It is possible, however, that the dependence 
on Shi is more robust, because the ionization pattern is near! 



nearly 

ham 



independe nt of the t iming of reionization (Fur lanetto et al.ll20Qi 
McOu inn etai]|2007l) . 



4 Z=9 LAE LUMINOSITY FUNCTIONS 

With our optical depth distributions in-hand, we can proceed to 
generate z — 9 LAE luminosity functions . To do this, we make 



the standard simpli f ying assumption (e.g. Furlanetto et al.| 20061 : 
lDiikstraetai1l2006l: [stark et al.ll2007l ; McOuinn et al. 200^ of a 



deterministic, linear mapping of halo mass to Lya luminosity, 
M oc L. As mentioned above, this assumption is often justified by 
the narrow mass range probed by existing instruments. Using this 
ansatz, the observed Lya luminosity of a LAE, L b s = Le~ TD 
(where the "intrinsic" Lya luminosity, L, includes resonant attenu- 
ation), can be written in terms of the halo mass: M Q b s = Me~ T ° , 
where M b s is the apparent mass of the halo under this mapping, 
after attenuation by the IGM. Given the minimum observable lu- 
minosity, L(Mmi n ) (where M m i n is the mass that would produce a 
luminosity L without any IGM absorption), one could then detect 
halos with td < — ln(A/ m i n /M). In order to maintain generality, 
we postpone the discussion of the L(M) mapping until $4. 11 where 
we compare the luminosity functions to observations. 

Hence, the cumulative number density of observable halos can 
be written as: 



n{> M n 



,xm) 



dM 



dn(> M) 



Af min 
ln(Af/M min ) 



dM 



(5) 



d:Tl 



dp(> t d ,M, x m ) 
8t d 



where dn(> M)/dM is the number density of halos per unit mass, 
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Figure 5. Luminosity functions at z = 9. Top: Number density of ob- 
jects brighter than some observed luminosity £ bs CWmi»)i i- e - with M 
exp[— to] > M m - ln . Curves correspond to xm 0, 0.26, 0.61, 0.72, 
0.83, 0.88, 0.93 (top to bottom). Bottom: Ratio of luminosity functions: 
n(> M m i n , xm)/n(> M m i n , 0). Curves correspond to Sjji = 0.26, 0.61, 
0.72, 0.83, 0.88, 0.93 (top to bottom). 

and the second integral is the fraction of halos of mass M which 
can be detected above the threshold M m i n . 

"Luminosity" functions generated with this procedure are 
shown in the top panel of Figure[5] The curves correspond to xm ~ 
0, 0.26, 0.61, 0.72, 0.83, 0.88, 0.93 (top to bottom). The bottom 
panel shows the ratio of the apparent and intrinsic luminosity func- 
tions: n(> Mmi n , XBi)/n(> M m in, 0). Curves correspond to xm 
= 0.26, 0.61, 0.72, 0.83, 0.88, 0.93 (top to bottom). 

The luminosity functions in Fig.[5]represent the first such pre- 
dictions for z — 9 which include the effect of inhomogeneous 
reionization. As in analytic models and other simulations, the de- 
cline in the number density of observable LAE in a partially neu- 
tral universe when compared to a fully ionized univ erse (see bottom 
panel of Fig.[H> is nearly independent of halo mass l Furlanetto et al.l 



panel ot rig.QJl is nearly independent ot nalo mass (furlanetto et al. 
l200dlMcOuinn et al.l2007»l 4 lMcOuinn et alj d2007t ) examined this 
question in detail and showed that the scale-independence is robust 
to other assumed mappings between halo mass and luminosity. One 
explanation is that a lognormal distribution of optical depths, act- 
ing on a power-law intrinsic luminosity function, produces a nearly 
power-law apparent luminosity function. The bright end, where the 
intrinsic luminosity function falls exponentially, would also fall 
exponentially except that the broad distribution of optical depths 
nearly erases the change in slope. Our results are consistent with 
this explanation (see also McOu inn et all2007l) . especially because 
we have shown that the optical-depth distribution is broadest for 



massive objects. The (nearly) scale-independent suppression ap- 
pears to be a generic feature of reionization, requiring some other 
explanatio n for the decline i n num ber density for bright LAEs ob- 
served by iKashikawa et all j2006j) . and possibly that of I Ota et"al] 
J2007h as well. 



4.1 Existing Observational Constraints 

By taking advantage of the strong magnific ation provided b y grav- 
itational lensing through galaxy clusters, [stark et al] J2007) found 
six candidate (> 5a) LAEs in the redshift range z = 8.7-10.2, 
with all but one falling within z — 9 ± 0.35. The candidates have 
unlensed luminosity estimates ranging from 10 41 ' 2 to 10 42 ' 7 ergs 
s _1 . A search for additional emission lines expected if the candi- 
dates were low-z interlopers has been completed and has not found 
evidence for a low-z scenario for any of the six candidates. If gen- 
uine, these detections can provide an invaluable first glimpse into 
the z ~ 9 universe. 

In order to compare our cumulative number densities in Fig. 
[5] with observations, one needs a mapping of Lya luminosity, L, 
to halo mass, M. Even if one assumes a deterministic, linear rela- 
tion, there are still many unknowns. The simplest version follows 
a well-trod path. We begin by assuming that about 2/3 of the ion- 
izing ph otons absorbed wi thin the galaxy are converted into Lya 
photons jOsterbrock|[T989h . One can then write the conversion as 
L = 0.67hv a (l — feac)p*^fT JlIea , where v a is the rest-frame Lya 
frequency, / csc is the escape fraction of ionizing photons, p* is the 
star formation rate (SFR), e 7 is the ionizing photon efficiency per 
stellar mass, and T 7 , rcs is the fraction of Lya photons which escape 
from the galaxy without getting resonantly absorbed[f] Assuming 
that galaxies steadily convert a fraction, /» , of their gas into stars 
over some mean time-scale, t„, and that / csc <C 1, one can write 
the above relation as: 



(6) 



= 1. 



, 1rv — 12 / e ~l f*T~f ,ros \ 

x 10 erg I — J 



M 



The free parameters in equation ® are all almost unconstrained at 
high redshifts. Hence, we are interested in exploring a wide range 
of possibilities. From an astrophysical standpoint, it is much easier 
to use observed number densities to set robust upper limits on xm 
than it is to set robust lower limits, since in the theoretical model 
one at least has the hard upper limit of using all available gas in 
every galaxy above the detection threshold. Setting conservative 
upper limits on aim translates to maximizing (e 7 /*T 7 . ros //;, ) in the 
L <-» M mapping. K eeping this in min d, we apply four different 
L <-» M choi ces to thelStark et al.ld2007h sample, with values of e 7 
all taken from lSchaerea ( 120031) assuming metallicities of Z ~ 0.04 
Zt, for Pop II and Z ~ 10 -7 Z* for Pop III star formation: 

(i) z ~ 6 parameters with Pop II stars. Here we use values 
which have been shown to fit z = 5.7 LAE luminosity functions 
fairly well toiikstra et al]|2006l : Istark et alj|2007l ; iMcOuinn et all 
l2007h . We set /,T 7 , IB8 = 0.1, U = 2/3 of the Hubble time = 
1.87 x 10 16 s, and e 7 = 6.3 x 10 60 ionizing photons M" 1 as 
might be expected from a Pop II IMF. This translates to the relation 



4 The slightly "wavy" features in some of the curves in Fig. |5]result from 
the discrete halo masses returned by our halo filtering procedure. The effect 
is more noticeable at small xjji, when the tjj distributions become weaker 
functions of M (see Fig.[2]and eq.|5J. 



5 As we have seen, our simulations model the damping wing component 
of the IGM absorption, tjj . Hence, in order to compare with our simulated 
number densities, we only need worry about the fraction of Lya photons 
that are resonantly absorbed. Defined as such, T 7!res ranges approximately 
from 0.5 to 1. 
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L = 6.3 x 10 31 erg s _1 (M/M ), and to roughly 0.24 (/ csc /0.02) 
ionizing photons per H atom0 

(ii) z ~ 6 parameters with Pop III stars. Here we again take 



5.7 LAE luminosit 



ty 
T) 



values of /*, T 7 , ros , and t» which fit . 

functions but use a Pop III IMF for e 7 at z = 9.[stark et alj J20a 
show that a similar model can fit the lStark et al.1 1 2007 ) constraints 
moderately well. Specifically, we set /*T 7 , r cs = 0.1, t* = 2/3 of 
the Hubble time = 1.87 x 10 16 s, and e 7 = 8.1 x 10 61 ionizing 
photons Mq 1 as might be expected from a Pop III IMF. This trans- 
lates to the relation L = 8.2 x 10 32 erg s _1 (M/M Q ), and to 3.1 
(/esc/0.02) ionizing photons per H atom. 

(iii) "maximally" conservative with Pop II stars. Here we push 
the limits of a Pop II model by setting /»T 7 ,res = 1, t* = the 
dynamical time = 4.6 x 10 15 s, and e 7 = 6.3 x 10 60 ionizing 
photons Mq 1 as might be expected from a Pop II IMF. In other 
words, we assume that every baryon inside every halo is converted 
into stars in one dynamical time (with this star formation episode 
synchronized across all halos) and every Lya photon escapes the 
environs of the galaxy. This translates to the relation L — 2.6 x 
10 33 erg s _1 (M/M ), and to 2.4 (/ csc /0.02) ionizing photons per 
H atom. 

(iv) "maximally" conservative with Pop III stars. Here we 
completely push the limits by setting f*T ltles = 1, t» = the dy- 
namical time = 4.6 x 10 15 s, and e 7 = 8.1 x 10 61 ionizing photons 
Mg 1 as might be expected from a Pop III IMF. This translates to 
the relation L = 3.3 x 10 34 erg s" 1 (M/M Q ), and to 31 (/esc/0.02) 
ionizing photons per H atom[j 

The relations implied by (ii) or (iii) are probably the most 
reasonably conservative estimates, as any stronger sources would 
have produced many more photons than are required to reionize the 
IGM. Relations with more efficient star formation [such as the ex- 
treme (iv)] would result in a n early reionizatio n, contrary to the evi- 
dence s uggested by WMAP jPage et al.l2006h and the SPS S quasar 
spectra dFan et all2006l ; lMesinger & Haimanll2004l2007h . 

Numb e r den sities derived from the z ~ 9 candidates in the 
IStark et alj d2007l) survey, transformed to mass units using the 
L <-» M relations above are shown in Figure [6] with the (i), (ii), 
(iii), (iv) mappings shown right to left in the figure[fl Note that ex- 
tending the thi > luminosity functions from our model to the 
low scales required by the most conservative L <-> M relation is 
computationally prohibitive. Luckily, the suppression of the lumi- 
nosity function is almost independent of M (see the bottom panel 
of Fig.[5j, allowing us to extend the Shi > estimates by using the 
mean suppression ratios obtained over the mass range 10 9 - 10 11 
Mq. Solid curves thusly generated correspond to xhi ~ 0, 0.26, 
0.61, 0.72, 0.83, 0.88, 0.93 (top to bottom). 

To estimate the size of the total (Poisson and cosmic variance) 
uncertainties, we have performed Monte-Carlo simulations mod- 



We estimate the number of ionizing photons per H atom by 
(e*/n H )/*/esc(fV^M) J Mdn{> M)/dM dM, where «h is the 
number density of hydrogen atoms, the integration extends over all halos 
at 2 = 9 above the cooling threshold T v i r = 10 4 K, and we set T 7jres = 1 
in order to be conservative. 

7 We will see that this extreme mapping implies that the Sta rk et al] J2007t) 
galaxies are below the T v j r = 10 4 K atomic cooling threshold. If we in- 
stead allow stars to form down to the H2 cooling threshold, T v ; r ~ 300 
K, this mapping would imply a whopping 140 (/ csc /0.02) ionizing photons 
per H atom! 

8 Of the three luminosity bins presented in IStark et alj 120071) , we only 
show the most tightly constrained one at L = 10 42 erg s -1 This is the 
only number density with less than 100% Poisson errors. 




10 7 10 8 10 9 10 10 10 1 



Figure 6. Luminosity functions at 2 = 9. Points correspo nd to number 
densities derived from the six 2 ~ 9 candidates in the Sta rk et all fe007t) 
survey, mapped using the L *-* M relations in the text, with (i), (ii), (iii), 
(iv) mappings shown right to left in the figure. Solid curves correspond to 
Xfu S3 0, 0.26, 0.61, 0.72, 0.83, 0.88, 0.93 (top to bottom). The dotted curve 
corresponds to xjji ~ 0, but with as = 0.86. Total error bars (Poisson and 
cosmic variance), generated with the Monte-Carlo procedure described in 
text, are shown for xm S3 and 0.72, at several choices of M m ; n . 



eling the IStark et al. I J2007I) observations, which used the lensing 
signature of 9 clusters to probe a combined volume of 13.5 Mpc 3 , 
given L > 10 42 erg s _1 . Specifically, we tile our simulation box 
with cells whose volume is equal to the mean volume probed by 
each cluster, 1.5 Mpc' 0We then repeatedly randomly select 9 such 
sample volumes in our simulation box, keeping track of the total 
number of LAEs in each group of 9 sample volumes^ The total 
error bars thus generated for xm ~ and 0.72, at several choices 

of M m i n are now shown in Fig. [6] 

Assuming that the IStark etalj i f2007l) candidates are genuine 
and using the maximally conservative relation in (iv), xm{z ~ 9) 
cannot be constrained: such a scenario would permit the IGM to be 
almost entirely neutral. However, it is very unlikely that all the fac- 
tors in eq. l|6} conspire to provide such efficient star formation and 
Lya photon transmission (and even if they did, reionization would 
likely have ended long before). The more reasonably conservative 
models, (ii) and (iii), prefer aim < 0.2 and aim < 0.7, respectively. 
The L <-» M relation from (i), which has been shown to provide 



9 Note that the estimated geometry of this lensed long-slit spectroscopic 
survey can be approximated with a long parallele piped with a 0.02 Mpc 2 
FOV and a LOS distance of 500 Mpc per cluster (Stark et al. 2007). Since 
our box is 250 Mpc on a side and our halo field resolution scale is ~0. 17 
Mpc, we must content ourselves with a flatter cell of the same volume with 
which to tile our box: a 0.03 Mpc 2 FoV with a LOS distance of 50 Mpc. 

10 Note that this tiling procedure does not precisely mimic the survey, be- 
cause we should ideally select volumes without tiling beforehand. Obtain- 
ing accurate statistics by randomly sampling volumes of our simulation box 
would be computationally prohibitive at small source number densities. 
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Figure 7. Fraction of mock-survey fields containing > N number of LAE. Curves correspond to xm 0, 0.46, 0.56, 0.67, 0.72, 0.77, 0.83, 0.88, 0.93 (top 
to bottom), assuming 

A^min = 8.06 X 10 10 , 3.68 X 10 10 , and 1.67 X 10 10 M in the sub-panels (top to bottom). Panels assume FoVs of: 1.5' X 1.5' (left 
panel) and 3' X 3' (right panel). 



a decent fit to z = 5.7 LAE luminosity functions, requires quite 
massive halos and drastically overestimates the observed z ~ 9 
abundances (overestimating the mass function by a factor of ~400 
at the required masses). It is therefore inconsistent with even a com- 
pletely ionized universe. Note however that since our estimates are 
based on the limited volume of our 250 Mpc 3 simulation box, we 
might be slightly underestimating the cosmic variance contribution 
to these error bars. 

One possible remedy is if our background cosmology is 
wrong. The dotted curve shows the mass function if xm ~ 0, but 
with erg = 0.86, a s the comb ined three-year WMAP and Lyman-a 
forest data prefer jLewisl2006h . This higher value of erg = 0.86 in- 
creases the abundance by a factor of several at the high-mass end, 
but it has only a modest effect in the range of interest. The observed 
abundances in our model (i) are still inconsistent with a xm ~ 
universe, overestimating the mass function by a factor of ~100. 

On a more fundamental level, the lStark et aljj2007h LAE sam- 
ple, if genuine, requires substantial star formation in halos with 
M < 1O 9 M0. This is only a factor ~ 10 greater than the atomic 
cooling threshold at z ~ 9, and these objects would correspond to 
some of the smallest galaxies observed at any redshift. They must 
also be fundamentally different from sources at z = 5.7, with ei- 
ther much higher star formation rates or stars that are much more 
efficient at producing ionizing photons. Unless the escape fraction 
of UV photons is extremely small, any scenario that is consistent 
with the observations would produce several ionizing photons per 
IGM baryon, so we would expect reionization to have been well un- 
derway (if not complete) by z — 9. Needless to say, such a scenario 
would be difficult to reconcile with reionization at z ~ 6-7. 

To this point, we have assumed that all galaxies above M m i n 
are LAEs. However, at moderate redshifts, we know that a majority 
of Lyman-b reak galaxies (LBG s) have weak or absent Lyaemission 
lines (e.g. Sha plev et al.ll2003l) . though there is some evidence of 
an inc rease in the fraction of LAEs among LBGs at higher red - 
shifts ([Dawson et alJl2004lHu et al.ll2004IShimasaku et alj|2006t) . 



Incorporating this into Figure|6]would be equivalent to shifting the 
curves downward by an amount equal to the fraction of all galax- 
ies that are LAEs. Thus this only strengthens our arguments, be- 
cause it widens the disparity between the observed number density 
of sources and the theoretical curves. 



5 COUNTS-IN-CELLS STATISTICS 

LAE abundances and luminosity functions have certainly proven 
to be very useful in studying the z ~ 6 universe. However, their 
interpretation is inevitably controversial, because knowledge of the 
M <-> L mapping is required for a meaningful estimate of xui- We 
have already seen how difficult it is to place meaningful constraints 
with this method. 

It is therefore worth considering other, more robust, sig- 
natures. Reionization modulates t he observed LAE fie l d, in- 
creasing the clusteri ng of sources dFurlanetto etai] |2004 1200(3 : 
McO uinn et"afl r2007). Galaxies inside large HII bubbles are more 
likely to be seen than those inside smaller bubbles. Thus, during 
reionization, we expect a field of view (FoV) that contains a LAE 
to have a higher probability of containing another LAE, than would 
be the case after reionization. This reionization-induced clustering 
is illustrated in Figure Q] and can be quantified in any number of 
ways. To date, studies have focused on the linear bias and power 
spectrum iFurlanetto et alj |2006; McOuinn et al. 2007). However, 
the modulation is non-gaussian, so other statistics may be as pow- 
erful. 

Here we explore simple, statistical estimates of "counts-in- 
cells" that can be easily applied to future surveys. These simple 
number counts essentially represent an "integrated" measurement 
of the clustering and hence can be easier to generate with a lim- 
ited observational sample than other statistical indicators such as 
the power spectrum/correlation function, which try to measure the 
detailed scale dependence. They also place less stringent require- 
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ments on the survey strategy than, for example, the power spec- 
trum, because they are easier to interpret with non-uniform survey 
coverage (for example, following up bright sources to search for 
fainter neighbors; see below). They are therefore most likely to be 
powerful early in reionization or at extremely high redshifts, when 
sources are rare. 

To study this reionization-induced clustering, we perform a 
mock-survey in our simulation box. We tile our simulation box and 
tabulate "in-cell" number counts, given xm and M m i n . For sim- 
plicity, we assume the survey FoVs are cubical volumes, but our 
results can easily be extended to more detailed survey specifica- 
tions, once they are available. Note also that cell sizes are fairly 
arbitrary, as surveys can be broken down into small cells for anal- 
ysis, and the optimal choice will depend on the survey depth and 
volume. In Figure [7] we present the fraction of our mock-survey 
fields containing > iV LAEs. The panels are constructed assuming 
FoVs of 1.5' x 1.5' (left panel) and 3' x 3' (right panel), corre- 
sponding to comoving sizes of 4.23 3 and 8.45 3 Mpc 3 , respectively. 
The latter (in angular coordinates) i s the FoV of the N ear-Infrared 
Spectrograph (NIRSpec) on JWST dGardner et alll2006h ; other fu- 
ture IR instruments subtend comparable areas (for example, the In- 
frared Multi-Object Spectrograph (IRMOS) on the proposed TMT 
has a 5' x 5' Fo\©. Curves correspond to xm « 0, 0.46, 0.56, 
0.67, 0.72, 0.77, 0.83, 0.88, 0.93 (top to bottom), assuming M min = 
8.06 x 10 10 , 3.68 x 10 10 , and 1.67 x 10 10 M Q in the sub-panels 
(top to bottom). In Figure [8] we show the ratio of the PDF curves 
from the bottom right panel in Fig.|7]to those expected from a pure 
Poisson distribution. Note the relatively small change in the prob- 
abilities during the later stages of reionization, from xm = 0.5 to 
xm ~ 0, compared to the rapid evolution during the initial stages. 
This is, of course, primarily because the strong damping-wing ab- 
sorption causes most of the sources to fall below the survey detec- 
tion threshold before Shi ~ 0.5. 

In other words, because the curves shown in Fig.|7]correspond 
to raw number counts, they include two distinct consequences of 
higher values of Shi : (i) the enhanced clustering footprint of reion- 
ization; and (ii) the decrease in the absolute number of observed 
LAEs. As is the case for luminosity functions, (ii) is difficult to 
disentangle from astrophysical uncertainties and hence is not an 
ideal probe of reionization. The first offers more promise, as the 
enhanced clustering of LAEs from reionization could only be mim- 
icked by a large shift in the masses of underlying halos hosting 
LAEs, as discussed below. Thus we wish to isolate the first effect. 

To this end, we note that the var iance of the in-cell count dis- 
tributions can be represented as (e.g. |Peebleslll98d) : 

a% = (N) + (^Pj J J dVidV 2 £ 12 , (7) 

where (N) is the mean number of LAEs in a cell (i.e. field), and 
the integrals over the two-point correlation function, £12, are per- 
formed over the cell volume, V. Keeping this in mind, in Figure(9] 
we plot: 

£ia = {el - (AT)) / (N) 2 , (8) 

essentially the average value of the correlation function over a cell's 
volume. Curves correspond to M m j n = 3.68 x 10 10 M a (dotted), 
and 1.67 x 10 10 M© (short-dashed). 

If sources are uncorrelated (i.e. Poisson distributed), o~1 — 



11 http://www.tmt.org 
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Figure 8. Ratio of the PDF curves from the bottom right panel in Fig. 
UJ (in reverse order, bottom to top) to those expected from a pure Poisson 
distribution. 

(N), and £12 = 0. The biased nature of structure formation man- 
ifests as a constant, positive £12 (because we hold redshift, and 
hence the halo population, fixed), and reionization manifests as an 
increase in £12. Shallower surveys with larger M m % n have a larger 
£12, because these rarer objects are more highly clustered intrinsi- 
cally. Interestingly, the curves are fairly flat for small neutral frac- 
tions; this is because nearly all of the objects are visible, so the 
observed clustering nearly matches its intrinsic value. Reionization 
induces a sharp rise in £12 at xm ~ 0.5, with the location of the 
rise being a very weak function of the cell size and M m i n . This 
increase is a result of the "small-sc ale" clustering enhancement de- 
scribed by Furlanetto et al. (2006), which occurs because only the 
sources inside rare large bubbles remain visible early in reioniza- 
tion, but at the same time a large fraction of sources inside such 
bubbles are visible. The enhancement is nearly independent of the 
underlying halo mass. 

The long-dashed curve in the right panel of Fig. [9] shows the 
same quantity as the short-dashed curve, but in a scenario in which 
(N) is held constant regardless of the neutral fraction (at the num- 
ber density found with Shi = 0.77) by randomly selecting halos 
above the xhi -dependent mass threshold. The fact that the long and 
short dashed curves overlap illustrates explicitly that we have accu- 
rately removed the Poisson component of the fluctuations from our 
statistic. 

The dot-dashed curve in the right panel of Fig. [9] is gener- 
ated by selecting only the most massive halos with number density 
fixed by the value at xm = 0.77 for M min = 1.67 x 10 10 M 
(unlike the random selection performed for the long-dashed curve). 
The curve is flat at small xm since it corresponds to the same set 
of massive sources. The curve then increases at xm > 0.6, when 
ionized regions are small enough to make some of these highly- 
biased, massive sources fall below the the M m i n = 1.67 x 10 10 
Mq detection threshold. We see that the reionization-induced rise 
in £12 surpasses the intrinsic clustering of even the most massive 
sources with the same number density at xm > 0.6. Thus there 
is no question that reionization can be observed through cluster- 
ing measurements, at least sufficiently early in the process. Hence, 
counts-in-cells can provide a simple, robust probe of reionization. 

Detecting the sharp rises evident in Fig. [9] could be a 
"smoking-gun" signature of reionization; however, at any partic- 
ular redshift we have only one measurement and (as with the lu- 
minosity function) can only compare with measurements at dif- 
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ferent redshifts[3 The fact that the curves in Fig. [9] are strong 
functions of M m i n complicates the interpretation of any such de- 
tection, because any increase in £12 with redshift could be be- 
cause of galaxy evolution alone; for example, even if M m i n remains 
constant with redshift, the bias of the objects would still increase 
with redshift. This degeneracy can be overcome if reionization pro- 
gresses rapidly compared to the underlying structure, or if one cor- 
relates the LAE field with a Lyman-break galaxy (LBG) field, since 
the detectability of LBGs should not be affected by changes in xm 
(see lMcOuinn et al 1 l2007l for discussion of similar issues with ref- 
erence to the power spectrum). 

Unfortunately, such a process is always uncertain. Instead we 
favor measuring the excess probability (over that in a completely 
ionized universe) that a cell will contain TV or more LAEs, given 
that it already contains at least one LAE: 

AP* HI (> N\ > 1) = P* HI (> N\ > l)-ftH!„o(> N\ > 1) ,(9) 

where P(> TV > 1) and Px H1 ^o(> N\ > 1) are the probabili- 
ties that a cell containing at least one LAE will contain N or more 
LAEs, in a partially ionized and a fully ionized universe, respec- 
tively, normalized to have the same number density of LAEs. We 
normalize our xm ~ field by randomly choosing LAEs above 
M m in, until we obtain the same number of LAEs as in the partially 
ionized box. Note, however, that the proper normalization proce- 
dure is not well-defined so long as the mapping between mass and 
luminosity remains unknown. However, we checked that selecting 
only the most massive halos, until we obtain the same number den- 
sity, did not appreciably change the results for the N — 2 curves. 

The basic idea of this measure is to remove the overall nor- 
malization of the galaxy number density, whose evolution is un- 
certain, but to retain the enhanced probability of observing groups 
of sources inside the rare large bubbles. Also note that while the 
£12 statistic in eq. <[8j only uses second-order correlations and can 
be obtained by integrating over the power spectrum, the counts-in- 
cells method takes advantage of higher-order correlations and the 
excess probability from eq. l[9} shows this explicitly for N > 2. 

In Figure [TO] we plot the excess probability from eq. J5J, for 
1.5' x 1.5' (left panel) and 3' x 3' (right panel) cells. Solid (dotted) 
curves assume M min = 1.67 x 10 10 (3.68 x 10 10 ) M®. Thick and 
thin curves correspond to N = 2 and 3, respectively. 

The enhanced clustering footprint of reionization can easily 
be seen from Fig.[l0] For x m > 0.5 and N = 2, AP> 10%, 
and it increases by a factor of > 6 from xm = 0.2 to 0.8. Fur- 
thermore, the N = 2 curves are much more sensitive to xm than 
to M m i n , suggesting that this statistic of the reionization-induced 
clustering cannot be mimicked by a change in the mass thresh- 
old of the survey, or analogously by a shift in the underlying halo 
masses hosting LAEs. Hence, the excess probability that a cell con- 
taining LAEs contains more than one LAE is a robust indicator of 
changes in xhi. The N = 3 curves do fall significantly for the 
most massive objects (as do all the curves for xm ~ 1); this is 
simply because the sources are too rare (see Fig.UOt, The apparent 
drop-off at xm ~ 0.9 is an artifact of our finite box size. 

12 We remind the reader that our statistics are generated with the same 
intrinsic source field, i.e. at a fixed redshift z = 9. Since present-day simu- 
lations cannot accurately simulate the redshift evolution of xm, this is the 
cleanest way of extracting statistics on reionization, especially given that 
the ionizati on topology at fixed Sim is almost independent of redshift in 
this range iMcOuinn etal]|2007t) . Unfortunately, the real Universe is un- 
cooperative on this point, and thus observations of the different stages of 
reionization must necessarily be from different redshifts. 



We attempt to approximately remove the Poisson component 
of the clustering statistics from the excess probability by subtract- 
ing the second term in eq. {9}. However, it is not immediately obvi- 
ous that this statistic is completely independent of the LAE number 
density. Hence it is intriguing to probe the robustness of the seem- 
ing overlap of the N = 2 curves of different mass scales in Fig. 1101 
To this end, we recreated the M mln = 1.67 x 10 10 M , N = 2 
curve from Fig.QJJfor xm < 0.77, by randomly choosing halos in 
order to keep the number density constant (i.e. equal to the number 
density at xm = 0.77, as we had for the long-dashed curve in the 
right panel of Fig. [9}. We find that the excess probability remains 
fairly unchanged at xm > 0.5, seemingly suggesting that our AP 
statistic is robust (not very sensitive to the LAE number density). 
The excess probability falls off more rapidly at xm < 0.5 (where 
the excess clustering signature is weaker) than in our Figure [Tol 
though it is not clear if this trend is statistically significant since 
the error bars become large in this regime. Thus, our proposed AP 
statistic does appear to be relatively robust to the intrinsic number 
density, although the particularly close match in the TV = 2 curves 
of FigureQJJmay be somewhat coincidental. 

In Figure [TTJ we show the survey characteristics required to 
detect this excess probability, assuming N — 2 and 3' cubical 
cells. Specifically, we require AP — na > for a n-er detection, 
where AP is our derived value from Fig. [JJJ and a is the uncer- 
tainty on the measured value of AP in the given survey volume. 
Solid and dotted curves correspond to M m i n = 1-67 x 10 10 and 
3.68 x 10 10 Mq, respectively. In the bottom panel, we show the 
total number of cells containing galaxies that must be observed in 
order to detect the effect at the 3-er (top curve) and 2-er (bottom 
curve) level. It is nearly independent of halo mass at higher neutral 
fractions, because this integrated clustering measure depends only 
weakly on the characteristics of the underlying halo population (see 
the discussion above). Interestingly, a reasonably strong detection 
requires only several tens of galaxy detections, with the required 
source count decreasing as Shi increases. This is because AP is 
large compared to the raw probability of detecting two neighbor- 
ing galaxies and increases as the ionized regions get smaller. This 
indicates the power of the counts-in-cell approach: the actual num- 
ber density of objects decreases by ~ 5 over this range, but this is 
compensated by the increased probability. Many other approaches 
to clustering, such as the power spectrum, would lose sensitivity in 
this range because of the rarity of the sources. 

The top panel shows the corresponding survey volumes that 
must be observed to detect this many galaxies. (Of course, this in- 
creases rapidly with the mass threshold, even though AP does not, 
because the probability of having N = 1 is much smaller for rarer 
sources.) We can see that a survey volume of 6(12) x 10 5 Mpc 3 is 
required to detect the enhanced clustering at xm ~ 0.5-0.8, with 
a 2-ct (3-ct) accuracy and M mm = 1.67 x 10 10 Mq sensitivity. If 
the survey depth is only one cell deep, this corresponds to a several 
square degree survey at least. A deep, blind survey subtending such 
a region may be difficult in the foreseeable future, given the mod- 
est FoVs of forthcoming near-IR instruments. However, the modest 
number of cells that actually contain sources suggests that a shal- 
low but wide survey to identify extremely bright candidates, fol- 
lowed by deep followup with a large telescope to searc h for fainter 
neighb ors, may be a viable strategy. Moreover, if the IStark et al.l 
d2007l) candidates are truly at z — 9 and correspond to halos with 
M<10 9 Mq, detecting large numbers of sources may be much 
easier than our conservative estimates suggest. This is one advan- 
tage of counts-in-cells over the power spectrum, which is more 
model-dependent in such unconventional survey strategies. 
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Figure 9. The average two-point correlation function from eq. (8j as functions of xhi- Curves correspond to 3.68 X 10 10 Mg (dotted), and 1.67 X 10 10 Mq 
(short-dashed). The long-dashed curve in the right panel shows the same quantity as the short-dashed curve, but in a scenario in which (N) is held constant at 
the number density found with xhi = 0.77. The dot-dashed curve in the right panel is generated assuming Af m i n = 1.67 X 10 10 Mq, but by selecting only 
the most massive halos with number density fixed by the value at Shi = 0.77. 




Figure 10. Excess probability (over that in an ionized universe normalized to the same number density of LAEs) that a FoV containing at least one LAE will 
contain N or more LAEs. Panels correspond to FoV of 1.5' X 1.5' (left) and 3' X 3' (right). Solid and dotted curves assume M m ; n = 1.67 X 10 10 and 
3.68 X 10 10 Mq, respectively. Thick and thin curves correspond to N = 2 and 3, respectively. Error bars indicate the 1-cr Poisson uncertainty on the N = 2, 
Af min = 1.67 X 10 10 Af min curves. 



Throughout our discussion, we have used somewhat arbitrary 
cell sizes, as surveys can be broken down into small cells for analy- 
sis, and the optimal choice will depend on the characteristics of the 
particular survey. However, in the absence of spectroscopic follow- 
up, the LAE redshift might only be localized to the width of a nar- 
row band filter. This sets a minimum cell size in the LOS direction. 
With this in mind, in Fig. [12] we plot the same in-cell statistics as 
in Fig. Qj] but extending the LOS axis of the cells to correspond 
to a narrow band filter with R — A/ A A ~ 100. Comparing to 
Fig.QT] the required number of cells and survey volume increases 
by less than a factor of ~ 2. Essentially, so long as the cells are not 
so long that random, distant neighbors overwhelm the reionization 
clustering, cells of any radial size can be used without a substantial 
increase in the survey requirements. 

As mentioned previously, counts-in-cells (and specifically our 
AP statistic) can make use of higher-order correlations. In Fig. 1131 
we plot the same statistics as in Fig. Qj] but using clustering of the 
N = 3 term. Interestingly, at least for low-mass halos, measuring 
this term is no more difficult than measuring the two-point statis- 
tic. These higher-order correlations have not yet been quantified in 
the context of reionization, so we also show (with the dot-dashed 
curve) the requirements (assuming M m i n = 1.67 x 10 10 Mq) to 



detect three-point clustering without reionization, assuming that a 
survey probes all halos above a given mass threshold, normalized to 
the same number density. (In other words, this is the requirement to 
detect three point clustering from just gravitational instability; for 
this nearly gaussian random field it contains little more information 
than the power spectrum itself.) The = 3 term is much weaker 
in this case and requires a considerably larger survey to measure. 
Thus, detection of this three-point term may provide powerful sup- 
port for any detection of reionization-induced clustering. 

Note that we have ignored foreground contamination here. 
This is likely a serious issue in any survey, but the details of fore- 
ground removal depend on the survey specifics. Hence we defer it 
to future works, more focused on particular surveys. 

As mentioned previously, stochasticity in the L <-» M map- 
ping, and secondary effects on the detectability of the Lya line, 
such as galactic winds and gas infall, can in principle affect the 
details of our counts in-cell estimates above. We defer a thorough 
study of such effects to future works involving numerical simula- 
tions. 
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Figure 11. Survey characteristics required to detect the excess clustering 
probability due to reionization (eq.|9j assuming M m j n = 1.67 X 10 10 
Mq (solid curves) and 3.68 X 10 10 Mq (dotted curves), and N = 2. 
The top panel shows the survey volume required for 3-cr (top curve) and 
2-cr (bottom curve) detections; the bottom panel shows the corresponding 
number of cells that actually contain galaxies. 



^ 10 



3'x3'x25 Mpc (JWST narrow band) 




Figure 12. Same as Fig. 1111 but with a cell size typical of narrow band 
LAE surveys, with R = A/ A A ~ 100. 



Figure 13. Same as Fig. [TT] but with TV = 3. The dot-dashed curve is 
generated assuming M m ; n = 1.67 X 10 10 Mq, but by selecting only the 
most massive halos, normalized to the same number density at each Shi- 

6 CONCLUSIONS 

In this work, we investigate the use of LAEs in constraining 
reionization at z ~ 9. We generate 2 = 9 halo, velocity and 
density fields in a 250 Mpc "semi-numerical" simulation box 
(Mesinger & Furlanettol20073) . Our excursion-set approach allows 
us to resolve halos with masses M > 2.2 x 10 s Mq. We construct 
ionization topologies corresponding to various values of xm- 

As a first step, we generate damping wing Lya optical depth 
distributions for the halos in our simulations. As expected, the more 
massive halos have narrower distributions with smaller mean ab- 
sorption, though the distributions become weaker functions of halo 
mass as reionization progresses. We show that these distributions 
are roughly lognormal but are broader and have a smaller mean 
than purely analytic predictions. 

Using the td distributions, we generate 2 = 9 LAE lumi- 
nosity functions for various values of xm- Constraining xm with 
luminosity functions is difficult due to the many uncertainties inher- 
ent in the host halo mass «-» Lya luminosity mapping. However, we 
show that ap plying a very cons ervative mapping to the number den- 
sities of the Star k et al. I J2007b sample yields xm{z = 9) < 0.7. 
More fundamentally, these LAE number densities, if genuine, re- 
quire substantial star formation in halos with M < 10 9 Mq, mak- 
ing them unique among the current sample of observed high-2 ob- 
jects. 

The topology of reionization increases the apparent clustering 
of the observed LAEs, aside from merely suppressing their number 
densities. We investigate the detectability of this signature using 
"counts-in-cell" statistics, which are more robust than the power- 
spectrum at studying such non-gaussian fields and (as integrated 
measures) are more straightforward to interpret in the few source 
limit. We find that the likelihood of observing more than one LAE 
among the subset of fields which contain LAEs is >10% greater 
in a universe with xm > 0.5 than in an ionized universe with the 
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same LAE number density. We show that this effect can be detected 
at z ~ 9 with just a few tens of 3' cubical cells containing galaxies, 
regardless of the underlying host halo mass. 

Counts-in-cells is only one approach to clustering, and it 
has advantages and disadvantages compared to the more common 
powe r spectrum (or correlation function) approach dMcOuinn et al.l 
120071) . The latter accounts for the detailed scale dependence of the 
clustering enhanceme nt (and so in principle provides information 
on the ionization field: iFurlanetto et alj|2006h but does not include 
higher-order corrections to the clustering. Moreover, robust power 
spectrum measurements require a large number of sources to be 
detected and are relatively unforgiving of ad hoc survey strate- 
gies, such as deep followup, which may be required when sources 
are rare. The small FoVs of planned near-IR instruments therefore 
make such measurements difficult at high redshifts. 

By contrast, our counts-in-cells approach offers very little in- 
formation on the scale dependence of the ionization field. However, 
it does include non-gaussianities, which become important early in 
reionization (Shi -* 0.5). We explicitly showed, for the first time, 
that reionization induces non-gaussianities in the galaxy distribu- 
tion that should be separable from structure formation, at least in 
the deep survey limit. As a result, our signature becomes more 
powerful earlier in the reionization process, more than compen- 
sating for the declining apparent number density of LAEs. It also 
places very few constraints on the survey geometry, because the 
fields need not be contiguous (and in fact, to compensate for cos- 
mic variance, probably should not be). An ideal strategy may be to 
identify particularly bright candidate objects with a wide, shallow 
survey. Then, one can follow up these candidates with deeper inte- 
grations to confirm their identity and search for fainter neighbors. 
Given that only a few tens of sources must be followed-up to pro- 
vide interesting constraints, such a strategy would require relatively 
modest telescope resources. 
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